Higher-Order Regret Bounds with Switching Costs
نویسنده
چکیده
This work examines online linear optimization with full information and switching costs (SCs) and focuses on regret bounds that depend on properties of the loss sequences. The SCs considered are bounded functions of a pair of decisions, and regret is augmented with the total SC. We show under general conditions that for any normed SC, σ(x,x) = ‖x−x′‖, regret cannot be bounded given only a bound Q on the quadratic variation of losses. With an additional bound Λ on the total length of losses, we prove O( √ Q+ Λ) regret for Regularized Follow the Leader (RFTL). Furthermore, an O( √ Q) bound holds for RFTL given a cost ‖x − x′‖2. By generalizing the Shrinking Dartboard algorithm, we also show an expected regret bound for the best expert setting with any SC, given bounds on the total loss of the best expert and the quadratic variation of any expert. As SCs vanish, all our bounds depend purely on quadratic variation. We apply our results to pricing options in an arbitrage-free market with proportional transaction costs. In particular, we upper bound the price of “at the money” call options, assuming bounds on the quadratic variation of a stock price and the minimum of summed gains and summed losses.
منابع مشابه
Online Learning with Switching Costs and Other Adaptive Adversaries
We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we...
متن کاملMulti-Armed Bandits with Metric Movement Costs
We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution giv...
متن کاملSpecWatch: A Framework for Adversarial Spectrum Monitoring with Unknown Statistics
In cognitive radio networks (CRNs), dynamic spectrum access has been proposed to improve the spectrum utilization, but it also generates spectrum misuse problems. One common solution to these problems is to deploy monitors to detect misbehaviors on certain channel. However, in multi-channel CRNs, it is very costly to deploy monitors on every channel. With a limited number of monitors, we have t...
متن کاملOnline learning with graph-structured feedback against adaptive adversaries
We derive upper and lower bounds for the policy regret of T -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of Õ(T ) and Õ(T ) for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bound...
متن کاملOnline Regret Bounds for Markov Decision Processes with Deterministic Transitions
We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the online regret (with respect to an (ε-)optimal policy) that are logarithmic in the number of steps taken. These bounds also match known asymptotic bounds for the general MDP setting. We also present corresponding lower bounds. As an...
متن کامل